Dataset statistics
| Number of variables | 12 |
|---|---|
| Number of observations | 205 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 19.3 KiB |
| Average record size in memory | 96.6 B |
Variable types
| Categorical | 1 |
|---|---|
| Numeric | 11 |
name has a high cardinality: 147 distinct values | High cardinality |
wheelbase is highly correlated with carlength and 7 other fields | High correlation |
carlength is highly correlated with wheelbase and 8 other fields | High correlation |
carwidth is highly correlated with wheelbase and 8 other fields | High correlation |
curbweight is highly correlated with wheelbase and 9 other fields | High correlation |
cylindernumber is highly correlated with curbweight and 5 other fields | High correlation |
enginesize is highly correlated with wheelbase and 9 other fields | High correlation |
boreratio is highly correlated with wheelbase and 8 other fields | High correlation |
horsepower is highly correlated with wheelbase and 9 other fields | High correlation |
citympg is highly correlated with carlength and 8 other fields | High correlation |
highwaympg is highly correlated with wheelbase and 9 other fields | High correlation |
price is highly correlated with wheelbase and 9 other fields | High correlation |
wheelbase is highly correlated with carlength and 5 other fields | High correlation |
carlength is highly correlated with wheelbase and 8 other fields | High correlation |
carwidth is highly correlated with wheelbase and 9 other fields | High correlation |
curbweight is highly correlated with wheelbase and 9 other fields | High correlation |
cylindernumber is highly correlated with carwidth and 4 other fields | High correlation |
enginesize is highly correlated with wheelbase and 9 other fields | High correlation |
boreratio is highly correlated with carlength and 7 other fields | High correlation |
horsepower is highly correlated with carlength and 8 other fields | High correlation |
citympg is highly correlated with carlength and 7 other fields | High correlation |
highwaympg is highly correlated with wheelbase and 8 other fields | High correlation |
price is highly correlated with wheelbase and 9 other fields | High correlation |
wheelbase is highly correlated with carlength and 3 other fields | High correlation |
carlength is highly correlated with wheelbase and 6 other fields | High correlation |
carwidth is highly correlated with wheelbase and 6 other fields | High correlation |
curbweight is highly correlated with wheelbase and 7 other fields | High correlation |
enginesize is highly correlated with carlength and 6 other fields | High correlation |
horsepower is highly correlated with curbweight and 4 other fields | High correlation |
citympg is highly correlated with carlength and 6 other fields | High correlation |
highwaympg is highly correlated with carlength and 6 other fields | High correlation |
price is highly correlated with wheelbase and 7 other fields | High correlation |
wheelbase is highly correlated with carlength and 9 other fields | High correlation |
carlength is highly correlated with wheelbase and 9 other fields | High correlation |
carwidth is highly correlated with wheelbase and 9 other fields | High correlation |
curbweight is highly correlated with wheelbase and 9 other fields | High correlation |
cylindernumber is highly correlated with wheelbase and 9 other fields | High correlation |
enginesize is highly correlated with wheelbase and 9 other fields | High correlation |
boreratio is highly correlated with wheelbase and 9 other fields | High correlation |
horsepower is highly correlated with wheelbase and 9 other fields | High correlation |
citympg is highly correlated with wheelbase and 9 other fields | High correlation |
highwaympg is highly correlated with wheelbase and 9 other fields | High correlation |
price is highly correlated with wheelbase and 9 other fields | High correlation |
name is uniformly distributed | Uniform |
Reproduction
| Analysis started | 2022-09-07 17:42:45.983876 |
|---|---|
| Analysis finished | 2022-09-07 17:43:07.224481 |
| Duration | 21.24 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
| Distinct | 147 |
|---|---|
| Distinct (%) | 71.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.7 KiB |
| toyota corona | 6 |
|---|---|
| toyota corolla | 6 |
| peugeot 504 | 6 |
| subaru dl | 4 |
| mitsubishi mirage g4 | 3 |
| Other values (142) |
Length
| Max length | 31 |
|---|---|
| Median length | 24 |
| Mean length | 14.14634146 |
| Min length | 6 |
Characters and Unicode
| Total characters | 2900 |
|---|---|
| Distinct characters | 46 |
| Distinct categories | 7 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 109 ? |
|---|---|
| Unique (%) | 53.2% |
Sample
| 1st row | alfa-romero giulia |
|---|---|
| 2nd row | alfa-romero stelvio |
| 3rd row | alfa-romero Quadrifoglio |
| 4th row | audi 100 ls |
| 5th row | audi 100ls |
Common Values
| Value | Count | Frequency (%) |
| toyota corona | 6 | 2.9% |
| toyota corolla | 6 | 2.9% |
| peugeot 504 | 6 | 2.9% |
| subaru dl | 4 | 2.0% |
| mitsubishi mirage g4 | 3 | 1.5% |
| mazda 626 | 3 | 1.5% |
| toyota mark ii | 3 | 1.5% |
| mitsubishi outlander | 3 | 1.5% |
| mitsubishi g4 | 3 | 1.5% |
| honda civic | 3 | 1.5% |
| Other values (137) | 165 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| toyota | 31 | 6.4% |
| nissan | 18 | 3.7% |
| mazda | 15 | 3.1% |
| mitsubishi | 13 | 2.7% |
| honda | 13 | 2.7% |
| corolla | 12 | 2.5% |
| subaru | 12 | 2.5% |
| peugeot | 11 | 2.3% |
| volvo | 11 | 2.3% |
| sw | 10 | 2.0% |
| Other values (167) | 342 |
Most occurring characters
| Value | Count | Frequency (%) |
| 285 | 9.8% | |
| a | 259 | 8.9% |
| o | 243 | 8.4% |
| t | 167 | 5.8% |
| e | 158 | 5.4% |
| s | 153 | 5.3% |
| i | 147 | 5.1% |
| l | 138 | 4.8% |
| r | 133 | 4.6% |
| c | 126 | 4.3% |
| Other values (36) | 1091 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 2384 | |
| Space Separator | 285 | 9.8% |
| Decimal Number | 179 | 6.2% |
| Close Punctuation | 13 | 0.4% |
| Dash Punctuation | 13 | 0.4% |
| Open Punctuation | 13 | 0.4% |
| Uppercase Letter | 13 | 0.4% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 259 | 10.9% |
| o | 243 | 10.2% |
| t | 167 | 7.0% |
| e | 158 | 6.6% |
| s | 153 | 6.4% |
| i | 147 | 6.2% |
| l | 138 | 5.8% |
| r | 133 | 5.6% |
| c | 126 | 5.3% |
| u | 126 | 5.3% |
| Other values (15) | 734 |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 44 | |
| 4 | 37 | |
| 1 | 23 | |
| 2 | 21 | |
| 5 | 18 | |
| 9 | 12 | 6.7% |
| 6 | 12 | 6.7% |
| 3 | 10 | 5.6% |
| 7 | 2 | 1.1% |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 4 | |
| D | 3 | |
| U | 1 | 7.7% |
| X | 1 | 7.7% |
| Q | 1 | 7.7% |
| V | 1 | 7.7% |
| C | 1 | 7.7% |
| N | 1 | 7.7% |
Space Separator
| Value | Count | Frequency (%) |
| 285 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 13 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 13 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 13 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2397 | |
| Common | 503 | 17.3% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 259 | 10.8% |
| o | 243 | 10.1% |
| t | 167 | 7.0% |
| e | 158 | 6.6% |
| s | 153 | 6.4% |
| i | 147 | 6.1% |
| l | 138 | 5.8% |
| r | 133 | 5.5% |
| c | 126 | 5.3% |
| u | 126 | 5.3% |
| Other values (23) | 747 |
Common
| Value | Count | Frequency (%) |
| 285 | ||
| 0 | 44 | 8.7% |
| 4 | 37 | 7.4% |
| 1 | 23 | 4.6% |
| 2 | 21 | 4.2% |
| 5 | 18 | 3.6% |
| ) | 13 | 2.6% |
| - | 13 | 2.6% |
| ( | 13 | 2.6% |
| 9 | 12 | 2.4% |
| Other values (3) | 24 | 4.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2900 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 285 | 9.8% | |
| a | 259 | 8.9% |
| o | 243 | 8.4% |
| t | 167 | 5.8% |
| e | 158 | 5.4% |
| s | 153 | 5.3% |
| i | 147 | 5.1% |
| l | 138 | 4.8% |
| r | 133 | 4.6% |
| c | 126 | 4.3% |
| Other values (36) | 1091 |
| Distinct | 53 |
|---|---|
| Distinct (%) | 25.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 98.75658537 |
| Minimum | 86.6 |
|---|---|
| Maximum | 120.9 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.7 KiB |
Quantile statistics
| Minimum | 86.6 |
|---|---|
| 5-th percentile | 93.02 |
| Q1 | 94.5 |
| median | 97 |
| Q3 | 102.4 |
| 95-th percentile | 110 |
| Maximum | 120.9 |
| Range | 34.3 |
| Interquartile range (IQR) | 7.9 |
Descriptive statistics
| Standard deviation | 6.021775685 |
|---|---|
| Coefficient of variation (CV) | 0.06097594062 |
| Kurtosis | 1.017038946 |
| Mean | 98.75658537 |
| Median Absolute Deviation (MAD) | 2.7 |
| Skewness | 1.050213776 |
| Sum | 20245.1 |
| Variance | 36.2617824 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 94.5 | 21 | 10.2% |
| 93.7 | 20 | 9.8% |
| 95.7 | 13 | 6.3% |
| 96.5 | 8 | 3.9% |
| 97.3 | 7 | 3.4% |
| 98.4 | 7 | 3.4% |
| 104.3 | 6 | 2.9% |
| 100.4 | 6 | 2.9% |
| 107.9 | 6 | 2.9% |
| 98.8 | 6 | 2.9% |
| Other values (43) | 105 |
| Value | Count | Frequency (%) |
| 86.6 | 2 | 1.0% |
| 88.4 | 1 | 0.5% |
| 88.6 | 2 | 1.0% |
| 89.5 | 3 | 1.5% |
| 91.3 | 2 | 1.0% |
| 93 | 1 | 0.5% |
| 93.1 | 5 | 2.4% |
| 93.3 | 1 | 0.5% |
| 93.7 | 20 | |
| 94.3 | 1 | 0.5% |
| Value | Count | Frequency (%) |
| 120.9 | 1 | 0.5% |
| 115.6 | 2 | 1.0% |
| 114.2 | 4 | |
| 113 | 2 | 1.0% |
| 112 | 1 | 0.5% |
| 110 | 3 | |
| 109.1 | 5 | |
| 108 | 1 | 0.5% |
| 107.9 | 6 | |
| 106.7 | 1 | 0.5% |
| Distinct | 75 |
|---|---|
| Distinct (%) | 36.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 174.0492683 |
| Minimum | 141.1 |
|---|---|
| Maximum | 208.1 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.7 KiB |
Quantile statistics
| Minimum | 141.1 |
|---|---|
| 5-th percentile | 157.14 |
| Q1 | 166.3 |
| median | 173.2 |
| Q3 | 183.1 |
| 95-th percentile | 196.36 |
| Maximum | 208.1 |
| Range | 67 |
| Interquartile range (IQR) | 16.8 |
Descriptive statistics
| Standard deviation | 12.33728853 |
|---|---|
| Coefficient of variation (CV) | 0.0708838862 |
| Kurtosis | -0.08289485345 |
| Mean | 174.0492683 |
| Median Absolute Deviation (MAD) | 6.9 |
| Skewness | 0.1559537713 |
| Sum | 35680.1 |
| Variance | 152.2086882 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 157.3 | 15 | 7.3% |
| 188.8 | 11 | 5.4% |
| 171.7 | 7 | 3.4% |
| 186.7 | 7 | 3.4% |
| 166.3 | 7 | 3.4% |
| 165.3 | 6 | 2.9% |
| 177.8 | 6 | 2.9% |
| 176.2 | 6 | 2.9% |
| 186.6 | 6 | 2.9% |
| 172 | 5 | 2.4% |
| Other values (65) | 129 |
| Value | Count | Frequency (%) |
| 141.1 | 1 | 0.5% |
| 144.6 | 2 | 1.0% |
| 150 | 3 | 1.5% |
| 155.9 | 3 | 1.5% |
| 156.9 | 1 | 0.5% |
| 157.1 | 1 | 0.5% |
| 157.3 | 15 | |
| 157.9 | 1 | 0.5% |
| 158.7 | 3 | 1.5% |
| 158.8 | 1 | 0.5% |
| Value | Count | Frequency (%) |
| 208.1 | 1 | 0.5% |
| 202.6 | 2 | |
| 199.6 | 2 | |
| 199.2 | 1 | 0.5% |
| 198.9 | 4 | |
| 197 | 1 | 0.5% |
| 193.8 | 1 | 0.5% |
| 192.7 | 3 | |
| 191.7 | 1 | 0.5% |
| 190.9 | 2 |
| Distinct | 44 |
|---|---|
| Distinct (%) | 21.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 65.90780488 |
| Minimum | 60.3 |
|---|---|
| Maximum | 72.3 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.7 KiB |
Quantile statistics
| Minimum | 60.3 |
|---|---|
| 5-th percentile | 63.6 |
| Q1 | 64.1 |
| median | 65.5 |
| Q3 | 66.9 |
| 95-th percentile | 70.46 |
| Maximum | 72.3 |
| Range | 12 |
| Interquartile range (IQR) | 2.8 |
Descriptive statistics
| Standard deviation | 2.145203853 |
|---|---|
| Coefficient of variation (CV) | 0.03254855562 |
| Kurtosis | 0.7027642441 |
| Mean | 65.90780488 |
| Median Absolute Deviation (MAD) | 1.4 |
| Skewness | 0.9040034988 |
| Sum | 13511.1 |
| Variance | 4.60189957 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=44)
| Value | Count | Frequency (%) |
| 63.8 | 24 | 11.7% |
| 66.5 | 23 | 11.2% |
| 65.4 | 15 | 7.3% |
| 63.6 | 11 | 5.4% |
| 64.4 | 10 | 4.9% |
| 68.4 | 10 | 4.9% |
| 64 | 9 | 4.4% |
| 65.5 | 8 | 3.9% |
| 65.2 | 7 | 3.4% |
| 64.2 | 6 | 2.9% |
| Other values (34) | 82 |
| Value | Count | Frequency (%) |
| 60.3 | 1 | 0.5% |
| 61.8 | 1 | 0.5% |
| 62.5 | 1 | 0.5% |
| 63.4 | 1 | 0.5% |
| 63.6 | 11 | |
| 63.8 | 24 | |
| 63.9 | 3 | 1.5% |
| 64 | 9 | 4.4% |
| 64.1 | 2 | 1.0% |
| 64.2 | 6 | 2.9% |
| Value | Count | Frequency (%) |
| 72.3 | 1 | 0.5% |
| 72 | 1 | 0.5% |
| 71.7 | 3 | |
| 71.4 | 3 | |
| 70.9 | 1 | 0.5% |
| 70.6 | 1 | 0.5% |
| 70.5 | 1 | 0.5% |
| 70.3 | 3 | |
| 69.6 | 2 | |
| 68.9 | 4 |
| Distinct | 171 |
|---|---|
| Distinct (%) | 83.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2555.565854 |
| Minimum | 1488 |
|---|---|
| Maximum | 4066 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.7 KiB |
Quantile statistics
| Minimum | 1488 |
|---|---|
| 5-th percentile | 1901 |
| Q1 | 2145 |
| median | 2414 |
| Q3 | 2935 |
| 95-th percentile | 3503 |
| Maximum | 4066 |
| Range | 2578 |
| Interquartile range (IQR) | 790 |
Descriptive statistics
| Standard deviation | 520.6802035 |
|---|---|
| Coefficient of variation (CV) | 0.2037436064 |
| Kurtosis | -0.0428537661 |
| Mean | 2555.565854 |
| Median Absolute Deviation (MAD) | 386 |
| Skewness | 0.6813981891 |
| Sum | 523891 |
| Variance | 271107.8743 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2385 | 4 | 2.0% |
| 1918 | 3 | 1.5% |
| 2275 | 3 | 1.5% |
| 1989 | 3 | 1.5% |
| 2410 | 2 | 1.0% |
| 2191 | 2 | 1.0% |
| 2535 | 2 | 1.0% |
| 2024 | 2 | 1.0% |
| 2414 | 2 | 1.0% |
| 4066 | 2 | 1.0% |
| Other values (161) | 180 |
| Value | Count | Frequency (%) |
| 1488 | 1 | |
| 1713 | 1 | |
| 1819 | 1 | |
| 1837 | 1 | |
| 1874 | 2 | |
| 1876 | 2 | |
| 1889 | 1 | |
| 1890 | 1 | |
| 1900 | 1 | |
| 1905 | 1 |
| Value | Count | Frequency (%) |
| 4066 | 2 | |
| 3950 | 1 | |
| 3900 | 1 | |
| 3770 | 1 | |
| 3750 | 1 | |
| 3740 | 1 | |
| 3715 | 1 | |
| 3685 | 1 | |
| 3515 | 1 | |
| 3505 | 1 |
| Distinct | 7 |
|---|---|
| Distinct (%) | 3.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.380487805 |
| Minimum | 2 |
|---|---|
| Maximum | 12 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.7 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 4 |
| median | 4 |
| Q3 | 4 |
| 95-th percentile | 6 |
| Maximum | 12 |
| Range | 10 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.080853764 |
|---|---|
| Coefficient of variation (CV) | 0.2467427857 |
| Kurtosis | 13.71486634 |
| Mean | 4.380487805 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.817459025 |
| Sum | 898 |
| Variance | 1.168244859 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 4 | 159 | |
| 6 | 24 | 11.7% |
| 5 | 11 | 5.4% |
| 8 | 5 | 2.4% |
| 2 | 4 | 2.0% |
| 3 | 1 | 0.5% |
| 12 | 1 | 0.5% |
| Value | Count | Frequency (%) |
| 2 | 4 | 2.0% |
| 3 | 1 | 0.5% |
| 4 | 159 | |
| 5 | 11 | 5.4% |
| 6 | 24 | 11.7% |
| 8 | 5 | 2.4% |
| 12 | 1 | 0.5% |
| Value | Count | Frequency (%) |
| 12 | 1 | 0.5% |
| 8 | 5 | 2.4% |
| 6 | 24 | 11.7% |
| 5 | 11 | 5.4% |
| 4 | 159 | |
| 3 | 1 | 0.5% |
| 2 | 4 | 2.0% |
| Distinct | 44 |
|---|---|
| Distinct (%) | 21.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 126.9073171 |
| Minimum | 61 |
|---|---|
| Maximum | 326 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.7 KiB |
Quantile statistics
| Minimum | 61 |
|---|---|
| 5-th percentile | 90 |
| Q1 | 97 |
| median | 120 |
| Q3 | 141 |
| 95-th percentile | 201.2 |
| Maximum | 326 |
| Range | 265 |
| Interquartile range (IQR) | 44 |
Descriptive statistics
| Standard deviation | 41.64269344 |
|---|---|
| Coefficient of variation (CV) | 0.3281346923 |
| Kurtosis | 5.305682092 |
| Mean | 126.9073171 |
| Median Absolute Deviation (MAD) | 23 |
| Skewness | 1.947655045 |
| Sum | 26016 |
| Variance | 1734.113917 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=44)
| Value | Count | Frequency (%) |
| 122 | 15 | 7.3% |
| 92 | 15 | 7.3% |
| 97 | 14 | 6.8% |
| 98 | 14 | 6.8% |
| 108 | 13 | 6.3% |
| 90 | 12 | 5.9% |
| 110 | 12 | 5.9% |
| 109 | 8 | 3.9% |
| 120 | 7 | 3.4% |
| 141 | 7 | 3.4% |
| Other values (34) | 88 |
| Value | Count | Frequency (%) |
| 61 | 1 | 0.5% |
| 70 | 3 | 1.5% |
| 79 | 1 | 0.5% |
| 80 | 1 | 0.5% |
| 90 | 12 | |
| 91 | 5 | 2.4% |
| 92 | 15 | |
| 97 | 14 | |
| 98 | 14 | |
| 103 | 1 | 0.5% |
| Value | Count | Frequency (%) |
| 326 | 1 | 0.5% |
| 308 | 1 | 0.5% |
| 304 | 1 | 0.5% |
| 258 | 2 | 1.0% |
| 234 | 2 | 1.0% |
| 209 | 3 | |
| 203 | 1 | 0.5% |
| 194 | 3 | |
| 183 | 4 | |
| 181 | 6 |
| Distinct | 38 |
|---|---|
| Distinct (%) | 18.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.329756098 |
| Minimum | 2.54 |
|---|---|
| Maximum | 3.94 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.7 KiB |
Quantile statistics
| Minimum | 2.54 |
|---|---|
| 5-th percentile | 2.97 |
| Q1 | 3.15 |
| median | 3.31 |
| Q3 | 3.58 |
| 95-th percentile | 3.78 |
| Maximum | 3.94 |
| Range | 1.4 |
| Interquartile range (IQR) | 0.43 |
Descriptive statistics
| Standard deviation | 0.2708437054 |
|---|---|
| Coefficient of variation (CV) | 0.08134040377 |
| Kurtosis | -0.7850418332 |
| Mean | 3.329756098 |
| Median Absolute Deviation (MAD) | 0.26 |
| Skewness | 0.0201564181 |
| Sum | 682.6 |
| Variance | 0.07335631277 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=38)
| Value | Count | Frequency (%) |
| 3.62 | 23 | 11.2% |
| 3.19 | 20 | 9.8% |
| 3.15 | 15 | 7.3% |
| 3.03 | 12 | 5.9% |
| 2.97 | 12 | 5.9% |
| 3.46 | 9 | 4.4% |
| 3.31 | 8 | 3.9% |
| 3.43 | 8 | 3.9% |
| 3.78 | 8 | 3.9% |
| 3.27 | 7 | 3.4% |
| Other values (28) | 83 |
| Value | Count | Frequency (%) |
| 2.54 | 1 | 0.5% |
| 2.68 | 1 | 0.5% |
| 2.91 | 7 | |
| 2.92 | 1 | 0.5% |
| 2.97 | 12 | |
| 2.99 | 1 | 0.5% |
| 3.01 | 5 | |
| 3.03 | 12 | |
| 3.05 | 6 | |
| 3.08 | 1 | 0.5% |
| Value | Count | Frequency (%) |
| 3.94 | 2 | 1.0% |
| 3.8 | 2 | 1.0% |
| 3.78 | 8 | 3.9% |
| 3.76 | 1 | 0.5% |
| 3.74 | 3 | 1.5% |
| 3.7 | 5 | 2.4% |
| 3.63 | 2 | 1.0% |
| 3.62 | 23 | |
| 3.61 | 1 | 0.5% |
| 3.6 | 1 | 0.5% |
| Distinct | 59 |
|---|---|
| Distinct (%) | 28.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 104.1170732 |
| Minimum | 48 |
|---|---|
| Maximum | 288 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.7 KiB |
Quantile statistics
| Minimum | 48 |
|---|---|
| 5-th percentile | 62 |
| Q1 | 70 |
| median | 95 |
| Q3 | 116 |
| 95-th percentile | 180.8 |
| Maximum | 288 |
| Range | 240 |
| Interquartile range (IQR) | 46 |
Descriptive statistics
| Standard deviation | 39.54416681 |
|---|---|
| Coefficient of variation (CV) | 0.3798048255 |
| Kurtosis | 2.68400616 |
| Mean | 104.1170732 |
| Median Absolute Deviation (MAD) | 25 |
| Skewness | 1.405310154 |
| Sum | 21344 |
| Variance | 1563.741129 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 68 | 19 | 9.3% |
| 70 | 11 | 5.4% |
| 69 | 10 | 4.9% |
| 116 | 9 | 4.4% |
| 110 | 8 | 3.9% |
| 95 | 7 | 3.4% |
| 114 | 6 | 2.9% |
| 160 | 6 | 2.9% |
| 101 | 6 | 2.9% |
| 62 | 6 | 2.9% |
| Other values (49) | 117 |
| Value | Count | Frequency (%) |
| 48 | 1 | 0.5% |
| 52 | 2 | 1.0% |
| 55 | 1 | 0.5% |
| 56 | 2 | 1.0% |
| 58 | 1 | 0.5% |
| 60 | 1 | 0.5% |
| 62 | 6 | 2.9% |
| 64 | 1 | 0.5% |
| 68 | 19 | |
| 69 | 10 |
| Value | Count | Frequency (%) |
| 288 | 1 | 0.5% |
| 262 | 1 | 0.5% |
| 207 | 3 | |
| 200 | 1 | 0.5% |
| 184 | 2 | |
| 182 | 3 | |
| 176 | 2 | |
| 175 | 1 | 0.5% |
| 162 | 2 | |
| 161 | 2 |
| Distinct | 29 |
|---|---|
| Distinct (%) | 14.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 25.2195122 |
| Minimum | 13 |
|---|---|
| Maximum | 49 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.7 KiB |
Quantile statistics
| Minimum | 13 |
|---|---|
| 5-th percentile | 16 |
| Q1 | 19 |
| median | 24 |
| Q3 | 30 |
| 95-th percentile | 37 |
| Maximum | 49 |
| Range | 36 |
| Interquartile range (IQR) | 11 |
Descriptive statistics
| Standard deviation | 6.542141653 |
|---|---|
| Coefficient of variation (CV) | 0.2594079379 |
| Kurtosis | 0.5786483405 |
| Mean | 25.2195122 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | 0.6637040288 |
| Sum | 5170 |
| Variance | 42.79961741 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=29)
| Value | Count | Frequency (%) |
| 31 | 28 | |
| 19 | 27 | |
| 24 | 22 | |
| 27 | 14 | 6.8% |
| 17 | 13 | 6.3% |
| 26 | 12 | 5.9% |
| 23 | 12 | 5.9% |
| 21 | 8 | 3.9% |
| 25 | 8 | 3.9% |
| 30 | 8 | 3.9% |
| Other values (19) | 53 |
| Value | Count | Frequency (%) |
| 13 | 1 | 0.5% |
| 14 | 2 | 1.0% |
| 15 | 3 | 1.5% |
| 16 | 6 | 2.9% |
| 17 | 13 | |
| 18 | 3 | 1.5% |
| 19 | 27 | |
| 20 | 3 | 1.5% |
| 21 | 8 | 3.9% |
| 22 | 4 | 2.0% |
| Value | Count | Frequency (%) |
| 49 | 1 | 0.5% |
| 47 | 1 | 0.5% |
| 45 | 1 | 0.5% |
| 38 | 7 | |
| 37 | 6 | |
| 36 | 1 | 0.5% |
| 35 | 1 | 0.5% |
| 34 | 1 | 0.5% |
| 33 | 1 | 0.5% |
| 32 | 1 | 0.5% |
| Distinct | 30 |
|---|---|
| Distinct (%) | 14.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 30.75121951 |
| Minimum | 16 |
|---|---|
| Maximum | 54 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.7 KiB |
Quantile statistics
| Minimum | 16 |
|---|---|
| 5-th percentile | 22 |
| Q1 | 25 |
| median | 30 |
| Q3 | 34 |
| 95-th percentile | 42.8 |
| Maximum | 54 |
| Range | 38 |
| Interquartile range (IQR) | 9 |
Descriptive statistics
| Standard deviation | 6.886443131 |
|---|---|
| Coefficient of variation (CV) | 0.2239404889 |
| Kurtosis | 0.4400703815 |
| Mean | 30.75121951 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | 0.5399971879 |
| Sum | 6304 |
| Variance | 47.423099 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=30)
| Value | Count | Frequency (%) |
| 25 | 19 | 9.3% |
| 38 | 17 | 8.3% |
| 24 | 17 | 8.3% |
| 30 | 16 | 7.8% |
| 32 | 16 | 7.8% |
| 34 | 14 | 6.8% |
| 37 | 13 | 6.3% |
| 28 | 13 | 6.3% |
| 29 | 10 | 4.9% |
| 33 | 9 | 4.4% |
| Other values (20) | 61 |
| Value | Count | Frequency (%) |
| 16 | 2 | 1.0% |
| 17 | 1 | 0.5% |
| 18 | 2 | 1.0% |
| 19 | 2 | 1.0% |
| 20 | 2 | 1.0% |
| 22 | 8 | |
| 23 | 7 | 3.4% |
| 24 | 17 | |
| 25 | 19 | |
| 26 | 3 | 1.5% |
| Value | Count | Frequency (%) |
| 54 | 1 | 0.5% |
| 53 | 1 | 0.5% |
| 50 | 1 | 0.5% |
| 47 | 2 | 1.0% |
| 46 | 2 | 1.0% |
| 43 | 4 | 2.0% |
| 42 | 3 | 1.5% |
| 41 | 3 | 1.5% |
| 39 | 2 | 1.0% |
| 38 | 17 |
| Distinct | 189 |
|---|---|
| Distinct (%) | 92.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 13276.71057 |
| Minimum | 5118 |
|---|---|
| Maximum | 45400 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.7 KiB |
Quantile statistics
| Minimum | 5118 |
|---|---|
| 5-th percentile | 6197 |
| Q1 | 7788 |
| median | 10295 |
| Q3 | 16503 |
| 95-th percentile | 32472.4 |
| Maximum | 45400 |
| Range | 40282 |
| Interquartile range (IQR) | 8715 |
Descriptive statistics
| Standard deviation | 7988.852332 |
|---|---|
| Coefficient of variation (CV) | 0.6017192504 |
| Kurtosis | 3.051647871 |
| Mean | 13276.71057 |
| Median Absolute Deviation (MAD) | 3306 |
| Skewness | 1.777678156 |
| Sum | 2721725.667 |
| Variance | 63821761.58 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 8921 | 2 | 1.0% |
| 9279 | 2 | 1.0% |
| 7898 | 2 | 1.0% |
| 8916.5 | 2 | 1.0% |
| 7775 | 2 | 1.0% |
| 8845 | 2 | 1.0% |
| 7295 | 2 | 1.0% |
| 7609 | 2 | 1.0% |
| 6692 | 2 | 1.0% |
| 6229 | 2 | 1.0% |
| Other values (179) | 185 |
| Value | Count | Frequency (%) |
| 5118 | 1 | |
| 5151 | 1 | |
| 5195 | 1 | |
| 5348 | 1 | |
| 5389 | 1 | |
| 5399 | 1 | |
| 5499 | 1 | |
| 5572 | 2 | |
| 6095 | 1 | |
| 6189 | 1 |
| Value | Count | Frequency (%) |
| 45400 | 1 | |
| 41315 | 1 | |
| 40960 | 1 | |
| 37028 | 1 | |
| 36880 | 1 | |
| 36000 | 1 | |
| 35550 | 1 | |
| 35056 | 1 | |
| 34184 | 1 | |
| 34028 | 1 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| name | wheelbase | carlength | carwidth | curbweight | cylindernumber | enginesize | boreratio | horsepower | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | alfa-romero giulia | 88.6 | 168.8 | 64.1 | 2548 | 4 | 130 | 3.47 | 111 | 21 | 27 | 13495.000 |
| 1 | alfa-romero stelvio | 88.6 | 168.8 | 64.1 | 2548 | 4 | 130 | 3.47 | 111 | 21 | 27 | 16500.000 |
| 2 | alfa-romero Quadrifoglio | 94.5 | 171.2 | 65.5 | 2823 | 6 | 152 | 2.68 | 154 | 19 | 26 | 16500.000 |
| 3 | audi 100 ls | 99.8 | 176.6 | 66.2 | 2337 | 4 | 109 | 3.19 | 102 | 24 | 30 | 13950.000 |
| 4 | audi 100ls | 99.4 | 176.6 | 66.4 | 2824 | 5 | 136 | 3.19 | 115 | 18 | 22 | 17450.000 |
| 5 | audi fox | 99.8 | 177.3 | 66.3 | 2507 | 5 | 136 | 3.19 | 110 | 19 | 25 | 15250.000 |
| 6 | audi 100ls | 105.8 | 192.7 | 71.4 | 2844 | 5 | 136 | 3.19 | 110 | 19 | 25 | 17710.000 |
| 7 | audi 5000 | 105.8 | 192.7 | 71.4 | 2954 | 5 | 136 | 3.19 | 110 | 19 | 25 | 18920.000 |
| 8 | audi 4000 | 105.8 | 192.7 | 71.4 | 3086 | 5 | 131 | 3.13 | 140 | 17 | 20 | 23875.000 |
| 9 | audi 5000s (diesel) | 99.5 | 178.2 | 67.9 | 3053 | 5 | 131 | 3.13 | 160 | 16 | 22 | 17859.167 |
Last rows
| name | wheelbase | carlength | carwidth | curbweight | cylindernumber | enginesize | boreratio | horsepower | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 195 | volvo 144ea | 104.3 | 188.8 | 67.2 | 3034 | 4 | 141 | 3.78 | 114 | 23 | 28 | 13415.0 |
| 196 | volvo 244dl | 104.3 | 188.8 | 67.2 | 2935 | 4 | 141 | 3.78 | 114 | 24 | 28 | 15985.0 |
| 197 | volvo 245 | 104.3 | 188.8 | 67.2 | 3042 | 4 | 141 | 3.78 | 114 | 24 | 28 | 16515.0 |
| 198 | volvo 264gl | 104.3 | 188.8 | 67.2 | 3045 | 4 | 130 | 3.62 | 162 | 17 | 22 | 18420.0 |
| 199 | volvo diesel | 104.3 | 188.8 | 67.2 | 3157 | 4 | 130 | 3.62 | 162 | 17 | 22 | 18950.0 |
| 200 | volvo 145e (sw) | 109.1 | 188.8 | 68.9 | 2952 | 4 | 141 | 3.78 | 114 | 23 | 28 | 16845.0 |
| 201 | volvo 144ea | 109.1 | 188.8 | 68.8 | 3049 | 4 | 141 | 3.78 | 160 | 19 | 25 | 19045.0 |
| 202 | volvo 244dl | 109.1 | 188.8 | 68.9 | 3012 | 6 | 173 | 3.58 | 134 | 18 | 23 | 21485.0 |
| 203 | volvo 246 | 109.1 | 188.8 | 68.9 | 3217 | 6 | 145 | 3.01 | 106 | 26 | 27 | 22470.0 |
| 204 | volvo 264gl | 109.1 | 188.8 | 68.9 | 3062 | 4 | 141 | 3.78 | 114 | 19 | 25 | 22625.0 |